Scalable parallel algorithms for surface fitting and data mining
نویسندگان
چکیده
This paper presents scalable parallel algorithms for high dimensional surface fitting and predictive modelling which are used in data mining applications. These algorithms are based on techniques like finite elements, thin plate splines, wavelets and additive models. They all consist of two steps: First, data is read from secondary storage and a linear system is assembled. Secondly, the linear system is solved. The assembly can be done with almost no communication and the size of the linear system is independent of the data size. Thus the presented algorithms are both scalable with the data size and the number of processors.
منابع مشابه
Scalable Data Mining for Rules
Data Mining is the process of automatic extraction of novel, useful, and understandable patterns in very large databases. High-performance scalable and parallel computing is crucial for ensuring system scalability and interactivity as datasets grow inexorably in size and complexity. This thesis deals with both the algorithmic and systems aspects of scalable and parallel data mining algorithms a...
متن کاملEfficient Data Mining: Scripting and Scalable Parallel Algorithms
This paper presents our approach to data mining that allows the coupling of parallel applications with a scripting language resulting in an efficient and flexible toolbox. Parallel algorithms which are scalable both in data size and number of processors are a key issue to be able to solve the ever increasing problems in data mining. On the other hand, data mining applications should be flexible...
متن کاملParallel Algorithms for Predictive Modelling
Parallel computing enables the analysis of very large data sets using large collections of flexible models with many variables. The computational methods are based on ideas from computational linear algebra and can draw on the extensive research on parallel algorithms in this area. Many algorithms for the direct and iterative solution of penalised least squares problems and for updating can be ...
متن کاملSPRINT: A Scalable Parallel Classifier for Data Mining
Classification is an important data mining problem. Although classification is a wellstudied problem, most of the current classification algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the...
متن کاملSPRINT: A Scalable Parallel Classi er for Data Mining
Classi cation is an important data mining problem. Although classi cation is a wellstudied problem, most of the current classication algorithms require that all or a portion of the the entire dataset remain permanently in memory. This limits their suitability for mining over large databases. We present a new decision-tree-based classi cation algorithm, called SPRINT that removes all of the memo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Computing
دوره 27 شماره
صفحات -
تاریخ انتشار 2001